Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
This paper aims to improve the Warping Planer Object Detection Network (WPOD-Net) using feature engineering to increase accuracy. What problems are solved using the Warping Object Detection Network using feature engineering? More specifically, we think that it makes sense to add knowledge about edges in the image to enhance the information for determining the license plate contour of the original WPOD-Net model. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.
translated by 谷歌翻译
The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy.
translated by 谷歌翻译
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
translated by 谷歌翻译
The ongoing transition from a linear (produce-use-dispose) to a circular economy poses significant challenges to current state-of-the-art information and communication technologies. In particular, the derivation of integrated, high-level views on material, process, and product streams from (real-time) data produced along value chains is challenging for several reasons. Most importantly, sufficiently rich data is often available yet not shared across company borders because of privacy concerns which make it impossible to build integrated process models that capture the interrelations between input materials, process parameters, and key performance indicators along value chains. In the current contribution, we propose a privacy-preserving, federated multivariate statistical process control (FedMSPC) framework based on Federated Principal Component Analysis (PCA) and Secure Multiparty Computation to foster the incentive for closer collaboration of stakeholders along value chains. We tested our approach on two industrial benchmark data sets - SECOM and ST-AWFD. Our empirical results demonstrate the superior fault detection capability of the proposed approach compared to standard, single-party (multiway) PCA. Furthermore, we showcase the possibility of our framework to provide privacy-preserving fault diagnosis to each data holder in the value chain to underpin the benefits of secure data sharing and federated process modeling.
translated by 谷歌翻译
这项研究介绍了我们对越南语言和语音处理任务(VLSP)挑战2021的文本处理任务的医疗保健领域的自动越南图像字幕的方法作为编码器的体系结构和长期的短期内存(LSTM)作为解码器生成句子。这些模型在不同的数据集中表现出色。我们提出的模型还具有编码器和一个解码器,但是我们在编码器中使用了SWIN变压器,LSTM与解码器中的注意模块结合在一起。该研究介绍了我们在比赛期间使用的培训实验和技术。我们的模型在vietcap4h数据集上达到了0.293的BLEU4分数,并且该分数在私人排行榜上排名3 $^{rd} $。我们的代码可以在\ url {https://git.io/jddjm}上找到。
translated by 谷歌翻译
鉴于在各种条件和背景下捕获的图像的识别药物已经变得越来越重要。已经致力于利用基于深度学习的方法来解决文献中的药丸识别问题。但是,由于药丸的外观之间的相似性很高,因此经常发生错误识别,因此识别药丸是一个挑战。为此,在本文中,我们介绍了一种名为Pika的新颖方法,该方法利用外部知识来增强药丸识别精度。具体来说,我们解决了一种实用的情况(我们称之为上下文药丸识别),旨在在患者药丸摄入量的情况下识别药丸。首先,我们提出了一种新的方法,用于建模在存在外部数据源的情况下,在这种情况下,在存在外部处方的情况下,药丸之间的隐式关联。其次,我们提出了一个基于步行的图形嵌入模型,该模型从图形空间转换为矢量空间,并提取药丸的凝结关系。第三,提供了最终框架,该框架利用基于图像的视觉和基于图的关系特征来完成药丸识别任务。在此框架内,每种药丸的视觉表示形式都映射到图形嵌入空间,然后用来通过图表执行注意力,从而产生了有助于最终分类的语义丰富的上下文矢量。据我们所知,这是第一项使用外部处方数据来建立药物之间的关联并使用此帮助信息对其进行分类的研究。皮卡(Pika)的体系结构轻巧,并且具有将识别骨架纳入任何识别骨架的灵活性。实验结果表明,通过利用外部知识图,与基线相比,PIKA可以将识别精度从4.8%提高到34.1%。
translated by 谷歌翻译
我们提出了一种针对8位神经网络加速器的新型8位量化感知训练(S8BQAT)方案。我们的方法灵感来自Lloyd-Max压缩理论,其实际适应性适应训练期间可行的计算开销。通过量化质心源自32位基线,我们使用多区域绝对余弦(MRACOS)正规器增强训练损失,该培训将重量汇总到其最近的质心,有效地充当伪压缩机。此外,引入了定期调用的硬压缩机,以通过模拟运行时模型重量量化来提高收敛速率。我们将S8BQAT应用于语音识别任务,使用经常性神经网络TransDucer(RNN-T)体系结构。使用S8BQAT,我们能够将模型参数大小增加,以将单词错误率相对降低4-16%,同时仍将延迟提高5%。
translated by 谷歌翻译
算法追索权旨在推荐提供丰富的反馈,以推翻不利的机器学习决策。我们在本文中介绍了贝叶斯追索权,这是一种模型不足的追索权,可最大程度地减少后验概率比值比。此外,我们介绍了其最小的稳健对应物,目的是对抗机器学习模型参数的未来变化。强大的对应物明确考虑了使用最佳传输(Wasserstein)距离规定的高斯混合物中数据的扰动。我们表明,可以将最终的最差目标函数分解为求解一系列二维优化子问题,因此,最小值追索问题发现问题可用于梯度下降算法。与现有的生成健壮的回流的方法相反,可靠的贝叶斯追索不需要线性近似步骤。数值实验证明了我们提出的稳健贝叶斯追索权面临模型转移的有效性。我们的代码可在https://github.com/vinairesearch/robust-bayesian-recourse上找到。
translated by 谷歌翻译
问题回答(QA)是信息检索和信息提取领域内的一项自然理解任务,由于基于机器阅读理解的模型的强劲发展,近年来,近年来,近年来的计算语言学和人工智能研究社区引起了很多关注。基于读者的质量检查系统是一种高级搜索引擎,可以使用机器阅读理解(MRC)技术在开放域或特定领域特定文本中找到正确的查询或问题的答案。 MRC和QA系统中的数据资源和机器学习方法的大多数进步尤其是在两种资源丰富的语言中显着开发的,例如英语和中文。像越南人这样的低资源语言见证了关于质量检查系统的稀缺研究。本文介绍了XLMRQA,这是第一个在基于Wikipedia的文本知识源(使用UIT-Viquad语料库)上使用基于变压器的读取器的越南质量检查系统,使用深​​层神经网络模型优于DRQA和BERTSERINI,优于两个可靠的QA系统分别为24.46%和6.28%。从三个系统获得的结果中,我们分析了问题类型对质量检查系统性能的影响。
translated by 谷歌翻译